Lower Bounds for Bayes Error Estimation

نویسندگان

  • András Antos
  • Luc Devroye
  • László Györfi
چکیده

ÐWe give a short proof of the following result. Let …X; Y † be any distribution on N f0; 1g, and let …X1; Y1†; . . . ; …Xn; Yn† be an i.i.d. sample drawn from this distribution. In discrimination, the Bayes error L ˆ infg Pfg…X† 6ˆ Y g is of crucial importance. Here we show that without further conditions on the distribution of …X; Y †, no rate-of-convergence results can be obtained. Let n…X1; Y1; . . . ; Xn; Yn† be an estimate of the Bayes error, and let f n…:†g be a sequence of such estimates. For any sequence fang of positive numbers converging to zero, a distribution of …X; Y † may be found such that E jL ÿ n…X1; Y1; . . . ; Xn; Yn†j f g an infinitely often. Index TermsÐDiscrimination, statistical pattern recognition, nonparametric estimation, Bayes error, lower bounds, rates of convergence. æ 1 INTRODUCTION THE pattern recognition problem may be formulated as follows: we are given n i.i.d. observations Dn ˆ f…X1; Y1†; . . . ; …Xn; Yn†g, drawn from the common unknown distribution of …X;Y † on R d f0; 1g. Given X, one must estimate Y as best as possible by a function gn…X† of X and the observations. The best one can hope for is to make an error equal to the Bayes error, L : Ln ˆ def Pfgn…X† 6ˆ Y jDng L ˆ def inf g:R !f0;1g Pfg…X† 6ˆ Y g: It is thus of great importance to be able to estimate L accurately, even before pattern recognition is attempted. Also, a comparison of estimates of Ln and L gives us an idea how much room is left for improvement. In a first group of methods, L is estimated by an estimate b Ln of the error probability Ln of some consistent classification rule gn. As such, this problem has been attempted by Fukunaga and Kessel [9], Chen and Fu [2], Fukunaga and Hummels [8], and Garnett and Yau [10], to cite just the early contributions. Concerning the error estimation of specific classification rules see Chapter 10 in McLachlan [11]. Clearly, if the estimate b Ln we use is consistent in the sense that b Ln ÿ Ln ! 0 with probability one as n!1, and the rule is strongly consistent, then b Ln ! L with probability one. In other words, we have a consistent estimate of the Bayes error probability. The problem is that even though for many classifiers, b Ln ÿ Ln can be guaranteed to converge to zero rapidly, regardless what the distribution of …X;Y † is (see Chapters 8, 23, 24, and 31 of Devroye et al. [7]), in view of Cover [3] and Devroye [4], the rate of convergence of Ln to L using such a method may be arbitrarily slow. Thus, we cannot expect a good performance for all distributions from such a method. The question thus is whether it is possible to come up with another method of estimating L (by n…X1; Y1; . . . ; Xn; Yn†) such that the difference n…X1; Y1; . . . ; Xn; Yn† ÿ L converges to zero rapidly for all distributions. Unfortunately, there is no method that guarantees a certain finite sample performance for all distributions. This disappointing fact is reflected in the following negative result (Theorem 8.5 of Devroye et al. [7]). Theorem 1. For every n, for any estimate n…X1; Y1; . . . ; Xn; Yn† of the Bayes error probability L , and for every > 0, there exists a distribution of …X;Y †, such that E j n…X1; Y1; . . . ; Xn; Yn† ÿ L j f g 1=4ÿ : The counterexamples in Theorem 1 vary with n, so it may still be possible that for every fixed distribution for …X;Y †, there exists a universal rate of convergence to zero for E j n…X1; Y1; . . . ; Xn; Yn† ÿ L j f g: The purpose of this note is to show that this too is impossible. We show the following: Theorem 2. For any sequence fang of positive numbers converging to zero, a distribution of …X;Y † on f1; 2; 3; . . .g f0; 1g may be found such that E j n…X1; Y1; . . . ; Xn; Yn† ÿ L j f g an

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimax Estimator of a Lower Bounded Parameter of a Discrete Distribution under a Squared Log Error Loss Function

The problem of estimating the parameter ?, when it is restricted to an interval of the form , in a class of discrete distributions, including Binomial Negative Binomial discrete Weibull and etc., is considered. We give necessary and sufficient conditions for which the Bayes estimator of with respect to a two points boundary supported prior is minimax under squared log error loss function....

متن کامل

Bayes, E-Bayes and Robust Bayes Premium Estimation and Prediction under the Squared Log Error Loss Function

In risk analysis based on Bayesian framework, premium calculation requires specification of a prior distribution for the risk parameter in the heterogeneous portfolio. When the prior knowledge is vague, the E-Bayesian and robust Bayesian analysis can be used to handle the uncertainty in specifying the prior distribution by considering a class of priors instead of a single prior. In th...

متن کامل

Polyshrink: An Adaptive Variable Selection Procedure That Is Competitive with Bayes Experts

We propose an adaptive shrinkage estimator for use in regression problems charaterized by many predictors, such as wavelet estimation. Adaptive estimators perform well over a variety of circumstances, such as regression models in which few, some or many coefficients are zero. Our estimator, PolyShrink, adaptively varies the amount of shrinkage to suit the estimation task. Whereas hard threshold...

متن کامل

On Bayes Risk Lower Bounds

This paper provides a general technique for lower bounding the Bayes risk of statistical estimation, applicable to arbitrary loss functions and arbitrary prior distributions. A lower bound on the Bayes risk not only serves as a lower bound on the minimax risk, but also characterizes the fundamental limit of any estimator given the prior knowledge. Our bounds are based on the notion of f -inform...

متن کامل

Truncated Linear Minimax Estimator of a Power of the Scale Parameter in a Lower- Bounded Parameter Space

 Minimax estimation problems with restricted parameter space reached increasing interest within the last two decades Some authors derived minimax and admissible estimators of bounded parameters under squared error loss and scale invariant squared error loss In some truncated estimation problems the most natural estimator to be considered is the truncated version of a classic...

متن کامل

Bounds on the Bayes and minimax risk for signal parameter estimation

A 3 r m h estimating the parameter 0 from a parametrized signal problem (with 0 5 0 5 L) observed through Gaussian white noise, four useful and computable lower bounds for the Bayes risk were developed. For problems with different L and Merent signal to noise ratios, some bounds am superior to the others. The lower bound obtained from taking the maximum of the four, serves not only as a good lo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Pattern Anal. Mach. Intell.

دوره 21  شماره 

صفحات  -

تاریخ انتشار 1999